VTV System

ABSTRACT

The following patent relates to an overall hardware configuration that produces an enhanced spatial television-like viewing experience. Unlike normal television, with this system the viewer is able to control both the viewing direction and relative position of the viewer with respect to the movie action. In addition to a specific hardware configuration, this patent also relates to a new video format which makes possible this virtual reality like experience.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of application Ser. No. 09/891,733,filed Jun. 25, 2001.

BACKGROUND AND SUMMARY OF THE INVENTION

The following patent relates to an overall hardware configuration thatproduces an enhanced spatial television-like viewing experience. Unlikenormal television, with this system the viewer is able to control boththe viewing direction and relative position of the viewer with respectto the movie action. In addition to a specific hardware configuration,this patent also relates to a new video format which makes possible thisvirtual reality like experience. Additionally, several proprietary videocompression standards are also defined which facilitate this goal. TheVTV system is designed to be an intermediary technology betweenconventional two-dimensional cinematography and true virtual reality.There are several stages in the evolution of the VTV system rangingfrom, in its most basic form, a panoramic display system to, in its mostsophisticated form featuring full object based virtual reality utilizinganimated texture maps and featuring live actors and/orcomputer-generated characters in a full “environment aware” augmentedreality system.

As can be seen in FIG. 1 the overall VTV system consists of a centralgraphics processing device (the VTV processor), a range of video inputdevices (DVD, VCR, satellite, terrestrial television, remote videocameras), infrared remote control, digital network connection andseveral output device connections. In its most basic configuration asshown in FIG. 2, the VTV unit would output imagery to a conventionaltelevision device. In such a configuration a remote control device(possibly infrared) would be used to control the desired viewingdirection and position of the viewer within the VTV environment. Theadvantage of this “basic system configuration” is that it isimplementable utilizing current audiovisual technology. The VTV graphicsstandard is a forwards compatible graphics standard which can be thoughtof as a “layer” above that of standard video. That is to sayconventional video represents a subset of the new VTV graphics standard.As a result of this standard's compatibility, VTV can be introducedwithout requiring any major changes in the television and/or audiovisualmanufacturer's specifications. Additionally, VTV compatible televisiondecoding units will inherently be compatible with conventionaltelevision transmissions.

In a more sophisticated configuration, as shown in FIG. 3, the VTVsystem uses a wireless HMD as the display device. In such aconfiguration, the wireless HMD can be used as a tracking device inaddition to simply displaying images. This tracking information in themost basic form could consist of simply controlling the direction ofview. In a more sophisticated system, both direction of view andposition of the viewer within the virtual environment can be determined.Ultimately, in the most sophisticated implementation, remote cameras onthe HMD will provide to the VTV system, real world images which it willinterpret into spatial objects, the spatial objects can then be replacedwith virtual objects thus providing an “environment aware” augmentedreality system.

The wireless HMD is connected to the VTV processor by virtue of awireless data link “Cybernet link”. In its most basic form this link iscapable of transmitting video information from the VTV processor, to theHMD and transmitting tracking information from the HMD to the VTVprocessor. In its most sophisticated form the cybernet link wouldtransmit video information both to and from the HMD in addition totransferring tracking information from the HMD to the VTV processor.Additionally certain components of the VTV processor may be incorporatedin the remote HMD thus reducing the data transfer requirement throughthe cybernet link. This wireless data link can be implemented in anumber of different ways utilizing either analog or digital videotransmission (in either an un-compressed or a digitally compressedformat) with a secondary digitally encoded data stream for trackinginformation. Alternately, a purely digital un-directional orbidirectional data link which carries both of these channels could beincorporated. The actual medium for data transfer would probably bemicrowave or optical. However either transfer medium may be utilized asappropriate. The preferred embodiment of this system is one whichutilizes on-board panoramic cameras fitted to the HMD in conjunctionwith image analysis hardware on board the HMD or possibly on the VTVbase station to provide real-time tracking information. To furtherimprove system accuracy, retroflective markers may also the utilized inthe “real world environment”. In such a configuration, switchable lightsources placed near to the optical axis of the on-board cameras would beutilized in conjunction with these cameras to form a “differential imageanalysis” system. Such a system features considerably higher recognitionaccuracy than one utilizing direct video images alone.

Ultimately, the VTV system will transfer graphic information utilizing a“universal graphics standard”. Such a standard will incorporate anobject based graphics description language which achieves a high degreeof compression by virtue of a “common graphics knowledge base” betweensubsystems. This patent describes in basic terms three levels ofprogressive sophistication in the evolution of this graphics language.

These three compression standards will for the purpose of this patent bedescribed as:

a) c-comb) s-comc) v-com

In its most basic format the VTV system can be thought of as a 360Degree panoramic display screen which surrounds the viewer.

This “virtual display screen” consists of a number of “video Pages”.Encoded in the video image is a “Page key code” which instructs the VTVprocessor to place the graphic information into specific locationswithin this “virtual display screen”. As a result of this ability toplace images dynamically it is possible to achieve the effectiveequivalent to both high-resolution and high frame rates withoutsignificant sacrifice to either. For example, only sections of the imagewhich are rapidly changing require rapid image updates whereas themajority of the image is generally static. Unlike conventionalcinematography in which key elements (which are generally moving) arelocated in the primary scene, the majority of a panoramic image isgenerally static.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram of an overall VTV system.

FIG. 2 is a schematic diagram of a VTV system according to its basicconfiguration.

FIG. 3 is a schematic diagram of a VTV system according to an advancedconfiguration.

FIG. 4 is an illustration of a cylindrical virtual display field.

FIG. 5 is an illustration of a truncated spherical virtual displayfield.

FIG. 6 is an illustration of a virtual representation of a 4 track soundsystem.

FIG. 7 is an illustration of a virtual representation of an 8 tracksound system.

FIG. 8 is a depiction of a VTV memory map for a system utilizing bothaugmented reality memory and virtual reality memory.

FIG. 9 is a VTV graphics engine diagram showing the data write side ofthe VTV processor.

FIG. 10 is a VTV graphics engine diagram showing the data read side ofthe VTV processor.

FIG. 11 is an example of an analogue video compatible VTV encoded videoline shown containing digital data.

FIG. 12 is an example of an analogue video compatible VTV encoded videoline shown containing audio data.

FIG. 13 is a diagram of an optical tracking system for detecting changesin position and orientation.

FIG. 14 is a diagram of an optical tracking system for detecting azimuthchanges in orientation.

FIG. 15 is a diagram of an optical tracking system for detectingelevation changes in orientation.

FIG. 16 is a diagram of an optical tracking system for detecting rollchanges in orientation.

FIG. 17 is a diagram of an optical tracking system for detectingforwards/backwards changes in position.

FIG. 18 is a diagram of an optical tracking system for detectingleft/right changes in position.

FIG. 19 is a diagram of an optical tracking system for detecting up/downchanges in position.

FIG. 20 is a block diagram of hardware for an optical tracking systemaccording to a simplified version.

FIG. 21 is a table showing one possible configuration of VTV digitalheader data.

VTV GRAPHICS STANDARD

In its most basic form the VTV graphics standard consists of a virtual360 degree panoramic display screen upon which video images can berendered from an external video source such as VCR, DVD, satellite,camera or terrestrial television receiver such that each video framecontains not only the video information but also information thatdefines its location within the virtual display screen. Such a system isremarkably versatile as it provides not only variable resolution imagesbut also frame rate independent imagery. That is to say, the actualupdate rate within a particular virtual image (entire virtual displayscreen) may vary within the display screen itself. This is inherentlyaccomplished by virtue of each frame containing its virtual locationinformation. This allows active regions of the virtual image to beupdated quickly at the nominal perception cost of not updating sectionson the image which have little or no change. Such a system is shown inFIG. 4.

To further improve the realism of the imagery, the basic VTV system canbe enhanced to the format shown in FIG. 5. In this configuration thecylindrical virtual display screen is interpreted by the VTV processoras a truncated sphere. This effect can be easily generated through theuse of a geometry translator or “Warp Engine” within the digitalprocessing hardware component of the VTV processor.

Due to constant variation of absolute planes of reference, mobile cameraapplications (either HMO based or Pan-Cam based) require additionaltracking information for azimuth and elevation of the camera system tobe included with the visual information in order that the images can becorrectly decoded by the VTV graphics engine. In such a system, absolutecamera azimuth and elevation becomes part of the image frameinformation. There are several possible techniques for theinterpretation of this absolute reference data. Firstly, the coordinatedata could be used to define the origins of the image planes within thememory during the memory writing process. Unfortunately this approachwill tend to result in remnant image fragments being left in memory fromprevious frames with different alignment values. A more practicalsolution is simply to write the video information into memory with anassumed reference point of 0 azimuth, 0 elevation. This videoinformation is then correctly displayed by correcting the displayviewport for the camera angular offsets. One possible data format forsuch a system is shown in FIG. 11 and FIG. 21.

Audio Standards:

In addition to 360 Degree panoramic video, the VTV standard alsosupports either 4 track (quadraphonic) or 8 track (octaphonic) spatialaudio. A virtual representation of the 4 track system is shown in FIG.6. In the case of the simple 4 track audio system sound through the leftand right speakers of the sound system (or headphones, in the case of anHMD based system) is scaled according to the azimuth the of the viewport (direction of view within the VR environment). In the case of the 8track audio system sound through the left and right speakers of thesound system (or headphones, in the case of an HMD based system) isscaled according to both the azimuth and elevation of the view port, asshown in the virtual representation of the system, FIG. 7.

In its most basic form, the VTV standard encodes the multi-track audiochannels as part of the video information in a digital/analogue hybridformat as shown in FIG. 12. As a result, video compatibility withexisting equipment can be achieved. As can be seen in this illustration,the audio data is stored in a compressed analogue coded format such thateach video scan line contains 512 audio samples. In addition to thisanalogue coded audio information, each audio scan line contains a threebit digital code that is used to “pre-scale” the audio information. Thatis to say that the actual audio sample value is X*S where X is thepre-scale number and S is the sample value. Using this dual-codingscheme the dynamic range of the audio system can be extended from about43 dB to over 60 dB. Secondly, this extending of the dynamic range isdone at relatively “low cost” to the audio quality because we arerelatively insensitive to audio distortion when the overall signal levelis high. The start bit is an important component in the system. Itsfunction is to set the maximum level for the scan line (i.e. the 100% orwhite level) This level in conjunction with the black level (this can besampled just after the colour burst) forms the 0% and 100% range foreach line. By dynamically adjusting the 0% and 100% marks for each lineon a line by line basis, the system becomes much less sensitive tovariations in black level due to AC-coupling of video sub modules and/orrecording and play back of the video media in addition to improving theaccuracy of the decoding of the digital component of the scan line.

In addition to this pre-scaling of the digital information, an audiocontrol bit (AS) is included in each field (at line 21). This controlbit sets the audio buffer sequence to 0 when it is set. This provides away to synchronize the 4 or 8 track audio information so that thecorrect track is always being updated from the current data regardlessof the sequence of the video Page updates.

In more sophisticated multimedia data formats such as computer AV filesand digital television transmissions, these additional audio trackscould be stored in other ways which may be more efficient or otherwiseadvantageous.

should be noted that, in addition to it's use as an audiovisual device,this spatial audio system/standard could also be used in audio only modeby the combination of a suitable compact tracking device and a set ofcordless headphones to realize a spatial-audio system for advanced hi-fiequipment.

Enhancements:

In addition to this simplistic graphics standard, There a are number ofenhancements which can be used alone or in conjunction with the basicVTV graphics standard. These three graphics standards will be describedin detail in subsequent patents, however for the purpose of this patent,they are known as:

-   -   a) c-corn    -   b) s-corn    -   c) v-corn

The first two standards relate to the definitions of spatial graphicsobjects where as the third graphics standard relates to a complete VRenvironment definition language which utilizes the first standards as asubset and incorporates additional environment definitions and controlalgorithms.

The VTV graphic standard (in its basic form) can be thought of as acontrol layer above that of the conventional video standard (NTSC, PALetc.). As such, it is not limited purely to conventional analog videotransmission standards. Using basically identical techniques, the VTVstandard can 30 operate with the HDTV standard as well as many of thecomputer graphic and industry audiovisual standards.

VTV Processor:

The VTV graphics processor is the heart of the VTV system. In its mostbasic form this module is responsible for the real-time generation ofthe graphics which is output to the display device (either conventionalTV/HDTV or HMD). In addition to digitizing raw graphics informationinput from a video media provision device such as VCR, DVD, satellite,camera or terrestrial television receiver. More sophisticated versionsof this module may real-time render graphics from a “universal graphicslanguage” passed to it via the Internet or other network connection. Inaddition to this digitizing and graphics rendering task, the VTVprocessor can also perform image analysis. Early versions of this systemwill use this image analysis function for the purpose of determiningtracking coordinates of the HMD. More sophisticated versions of thismodule will in addition to providing this tracking information, alsointerpret the real world images from the HMD as physicalthree-dimensional objects. These three-dimensional objects will bedefined in the universal graphics language which can then be recorded orcommunicated to similar remote display devices via the Internet or othernetwork or alternatively be replaced by other virtual objects of similarphysical size thus creating a true augmented reality experience.

The VTV hardware itself consists of a group of sub modules as follows:

-   -   a) video digitizing module    -   b) Augmented Reality Memory (ARM)    -   c) Virtual Reality Memory (VRM)    -   d) Translation Memory (TM)    -   e) digital processing hardware    -   f) video generation module

The exact configuration of these modules is dependent upon otherexternal hardware. For example, if digital video sources are used thenthe video digitizing module becomes relatively trivial and may consistof no more than a group of latch's or FIFO buffer. However, if compositeor Y/C video inputs are utilized then additional hardware is required toconvert these signals into digital format. Additionally, if a digitalHDTV signal is used as the video input source then an HDTV, decoder isrequired as the front end of the system (as HDTV signals cannot beprocessed in compressed format).

In the case of a field based video system such as analogue TV, the basicoperation of the VTV graphics engine is as follows:

-   -   a) Video information is digitized and placed in the augmented        reality memory on a field by field basis assuming an absolute        Page reference of 0 degree azimuth, 0 degree elevation with the        origin of each Page being determined by the state of the Page        number bits (P3-PO).    -   b) Auxiliary video information for background and/or        floor/ceiling maps is loaded into the virtual reality memory on        a field by field basis dependent upon the state of the “field        type” bits (F3-FO) and Page number bits (P3PO).    -   c) The digital processing hardware interprets this information        held in augmented reality and virtual reality memory and        utilizing a combination of a geometry processing engine (Warp        Engine), digital subtractive image processing and a new        versatile form of “blue-screening”, translates and selectively        combines this data into an image substantially similar to that        which would be seen by the viewer if they were standing in the        same location as that of the panoramic camera when the video        material was filmed. The main differences between this image and        that available utilizing conventional video techniques being        that it is not only 360 degree panoramic but also has the        ability to have elements of both virtual reality and “real        world” imagery melded together to form a complex immersive        augmented reality experience.    -   d) The exact way in which the virtual reality and “real world        imagery” is combined depends upon the mode that the VTV        processor is operating in and is discussed in more detail in        later sections of this specification. The particular VTV        processor mode is determined by additional control information        present in the source media and thus the processing and display        modes can change dynamically while displaying a source of VTV        media.    -   e) The video generation module then generates a single or pair        of video images for display on a conventional television or HMD        display device. Although the VTV image field will be updated at        less than full frame rates (unless multi-spin DVD devices are        used as the image media) graphics rendering will still occur at        full video frame rates, as will the updates of the spatial        audio. This is possible because each “Image Sphere” contains all        of the required information for both video and audio for any        viewer orientation (azimuth and elevation).

As can be seen in FIG. 9. The memory write side of the VTV processorshows two separate video input stages (ADC's). It should be noted thatalthough ADC-0 would generally be used for live panoramic video feedsand ADC-2 would generally be used for virtual reality video feeds frompre-rendered video material, both video input stages have full access toboth augmented reality and virtual reality memory (Le. they use a memorypool). This hardware configuration allows for more versatility in thedesign and allows several unusual display modes (which will be coveredin more detail in later sections). Similarly, the video output stages(DAC-0 and DAC-1) have total access to both virtual and augmentedreality memory.

Although having two input and two output stages improves the versatilityof the design, the memory pool style of design means that the system canfunction with either one or two input and/or output stages (althoughwith reduced capabilities) and as such the presence of either one or twoinput or output stages in a particular implementation should not limitthe generality of the specification.

For ease of design, high-speed static RAM was utilized as the videomemory in the prototype device. However, other memory technologies maybe utilized without limiting the generality of the design specification.

In the preferred embodiment, the digital processing hardware would takethe form of one or more field programmable logic arrays or custom ASIC.The advantage of using field programmable logic arrays is that thehardware can be updated at anytime. The main disadvantage of thistechnology is that it is not quite as fast as an ASIC. Alternatively,high speed conventional digital processors may' also be utilized toperform this image analysis and/or graphics generation task.

As previously described, certain sections of this hardware may beincorporated in the HMD, possibly even to the, point at which the entireVTV hardware exists within the portable HMD device. In such a case theVTV base station hardware would act only as a link between the HMD andthe Internet or other network with all graphics image generation, imageanalysis and spatial object recognition occurring within the HMD itself.

Note: The low order bits of the viewport address generator are runthrough a look up table address translator for the X and Y image axieswhich impose barrel distortion on the generated images. This providesthe correct image distortion for the current field of view for theviewport. This hardware is not shown explicitly in FIG. 10 because itwill probably be implemented within an FPGA or ASIC logic and thuscomprises a part of the viewport address generator functional block.Likewise roll of the final image will likely be implemented in a similarfashion.

It should be noted that only viewport-0 is affected by the translationengine (Warp Engine), Viewport-1 is read out undistorted. This isnecessary when using the superimpose and overlay augmented reality modesbecause VR-video material being played from storage has already been“flattened” (Le. pincushion distorted) prior to being stored whereas thelive video from the panoramic cameras on the HMD require distortioncorrection prior to being displayed by the system in Augmented Realitymode. After this preliminary distortion, images recorded by thepanoramic cameras in the HMD should be geometrically accurate andsuitable for storage as new VR material in their own right (Le. they canbecome VR material). One of the primary roles of the Warp Engine is thento provide geometry correction and trimming of the panoramic camera's onthe HMD. This includes the complex task of providing a seamlesstransition between camera views.

Exception Processing:

As can be seen in FIGS. 4, 5 a VTV image frame consists of either acylinder or a truncated sphere. This space subtends only a finitevertical angle to the viewer (+/−45 degrees in the prototype). This isan intentional limitation designed to make the most of the availabledata bandwidth of the video storage and transmission media and thusmaintain compatibility with existing video systems. However, as a resultof this compromise, there can exist a situation in which the view portexceeds the scope of the image data. There are several different ways inwhich this exception can be handled. Firstly, the simplest way to handlethis exception is to simply make out of bounds video data black. Thiswill give the appearance of being in a room with a black ceiling andfloor. However, an alternative and preferable configuration is to use asecondary video memory store to store a full 360 degree*180 degreebackground image map at reduced resolution. This memory area is known asVirtual reality memory (VRM). The basic memory map for the systemutilizing both augmented reality memory and virtual reality memory (inaddition to translation memory) is shown in FIG. 8. As can be seen inthis illustration, the translation memory area must have sufficientrange to cover a full 360 degree*180 degrees and ideally have the sameangular resolution as that of the augmented reality memory bank (whichcovers 360 degree*90 degree). With such a configuration, it is possibleto provide both floor and ceiling exception handling and variabletransparency imagery such as looking through windows in the foregroundand showing the background behind them. The backgrounds can be eitherstatic or dynamic and can be updated in basically the same way asforeground (augmented reality memory) by utilizing a Paged format.

Modes of Operation:

The VTV system has two basic modes of operation. Within these two modesthere also exist several sub modes. The two basic modes are as follows:

-   -   a) Augmented reality mode    -   b) Virtual reality mode

Augmented Reality Mode 1:

In augmented reality mode 1, selective components of “real worldimagery” are overlaid upon a virtual reality background. In general,this process involves first removing all of the background componentsfrom the “real world” imagery. This can be easily done by usingdifferential imaging techniques. i.e. by comparing current “real world”imagery against a stored copy taken previously and detecting differencesbetween the two. After the two images have been correctly aligned, theregions that differ are new or foreground objects and those that remainthe same are static background objects. This is the simplest of theaugmented reality modes and is generally not sufficiently interesting asmost of the background will be removed in the process. It should benoted that, when operated in mobile Pan-Cam (telepresense) or augmentedreality mode the augmented reality memory will generally be updated insequential Page order (Le. updated in whole system frames) rather thanrandom Page updates. This is because constant variations in the positionand orientation of the panoramic camera system during filming willprobably cause mis-matches in the image Pages if they are handledseparately.

Augmented Reality Mode 2:

Augmented reality mode 2 differs from mode 1 in that, in addition toautomatically extracting foreground and moving objects and placing thesein an artificial background environment, the system also utilizes theWarp Engine to “push” additional “real world” objects into thebackground. In addition to simply adding these “real world” objects intothe virtual environment the Warp Engine is also capable of scaling andtranslating these objects so that they match into the virtualenvironment more effectively. These objects can be handled as opaqueoverlays or transparencies.

Augmented Reality Mode 3:

Augmented reality mode 3 differs from the mode 2 in that, in this case,the Warp Engine is used to “pull” the background objects into theforeground to replace “real world” objects. As in mode 2: these objectscan be translated and scaled and can be handled as r either opaqueoverlays or transparencies. This gives the user to the ability to“match” the physical size and position of a “real world” object with avirtual object. By doing so, the user is able to interact and navigatewithin the augmented reality environment as they would in the “realworld” environment. This mode is probably the most likely mode to beutilized for entertainment and gaming purposes as it would allow aHollywood production to be brought into the users own living room.

Enhancements:

3.16) Clearly the key to making augmented reality modes 2 and 3 operateeffectively is a fast and accurate optical tracking system.Theoretically, it is possible for the VTV processor to identify andtrack “real world” objects in real-time. However, this is a relativelycomplex task, particularly as object geometry changes greatly withchanges in the viewer's physical position within the “real world”environment, and as I such, simple auto correlation type trackingtechniques will not work effectively. In such a situation, trackingaccuracy can be greatly improved by placing several retroflectivetargets on key elements of the objects in question. Such retroflectivetargets can easily be identified by utilizing relatively simpledifferential imaging techniques.

Virtual Reality Mode:

Virtual reality mode is a functionally simpler mode than the previousaugmented reality modes. In this mode “pre-filmed” or computer-generatedgraphics are loaded into augmented reality memory on a random Page byPage basis. This is possible because the virtual camera planes ofreference are fixed. As in the previous examples, virtual reality memoryis loaded with a fixed or dynamic background at a lower resolution. Theuse of both foreground and background image planes makes possible moresophisticated graphics techniques such as motion parallax.

Enhancements:

The versatility of virtual reality memory (background memory) can beimproved by utilizing an enhanced form of “blue-screening”. In such asystem, a sample of the “chroma-key” color is provided at the beginningof each scan line in the background field (area outside of the activeimage area). This provides a versatile system in which any color isallowable in the image. Thus, by surrounding individual objects with the“transparent” chroma-key color, problems and inaccuracies associatedwith the “cutting and pasting” of this object by the Warp Engine aregreatly reduced. Additionally, the use of “transparent” chroma-keyedregions within foreground virtual reality images allows easy generationof complex sharp edged and/or dynamic foreground regions with noadditional information overhead.

The Camera System:

As can be seen in the definition of the graphic standard, additionalPage placement and tracking information is required for the correctplacement and subsequent display of the imagery captured by mobilePan-Cam or HMD based video systems. Additionally, if Spatial audio is tobe recorded in real-time then this information must also be encoded aspart of the video stream. In the case of computer-generated imagery thisadditional video information can easily be inserted at render-stage.However, in the case of live video capture, this additional tracking andaudio information must be inserted into the video stream prior torecording. This can effectively be achieved through a graphicsprocessing module herein after referred to as the VTV encoder module.

Image Capture:

In the case of imagery collected by mobile panoramic camera systems, theimages are first processed by a VTV encoder module. This device providesvideo distortion correction and also inserts video Page information,orientation tracking data and spatial audio into the video stream. Thiscan be done without altering the video standard, thereby maintainingcompatibility with existing recording and playback devices. Althoughthis module could be incorporated within the VTV processor, having thismodule as a separate entity is advantageous for use in remote cameraapplications where the video information must ultimately be eitherstored or transmitted through some form of wireless network

Tracking System:

For any mobile panoramic camera system such as a “Pan-Cam” or HMD basedcamera system, tracking information must comprise part of the resultantvideo stream in order that an “absolute” azimuth and elevationcoordinate system be maintained. In the case of computer-generatedimagery this data is not required as the camera orientation is atheoretical construct known to the computer system at render time.

The Basic System:

The basic tracking system of the VTV HMD utilizes on-board panoramicvideo cameras to capture the required 360 degree visual information ofthe surrounding real world environment. This information is thenanalyzed by the VTV processor (whether it exists within the HMD or as abase station unit) utilizing computationally intensive yet relativelyalgorithmically simple techniques such as auto correlation. Examples ofa possible algorithm are shown in FIGS. 13-19.

The simple tracking system outlined in FIGS. 13-19 detects only changesin position and orientation. With the addition of several retroflectivetargets, which can be easily distinguished from the background imagesusing differential imaging techniques, it is possible to gain absolutereference points. Such absolute reference points would probably belocated at the extremities of the environmental region (i.e. confines ofthe user space) however they could be placed anywhere within the realenvironment, provided the VTV hardware is aware of the real worldcoordinates of these markers. The combination of these absolutereference points and differential movement (from the image analysisdata) makes possible the generation of absolute real world coordinateinformation at full video frame rates. As an alternative to theplacement of retroflective targets at known spatial coordinates, activeoptical beacons could be employed. These devices would operate in asimilar fashion to the retroflective targets in that they would beconfigured' to strobe light in synchronism with the video capture ratethus allowing differential video analysis to be performed on theresultant images. However, unlike passive retroflective targets, activeoptical beacons could, in addition to strobing in time with the videocapture, transmit additional information describing their real worldcoordinates to the HMD. As a result, the system would not have toexplicitly know the locations of these beacons as this data could beextracted “on the fly”. Such a system is very versatile and somewhatmore rugged than the simpler retroflective configuration.

Note: FIG. 20 shows a simplistic representation of the tracking hardwarein which the auto correlators simply detect the presence or absence of aparticular movement. A practical system would probably incorporate anumber of auto correlators for each class of movement (for example theremay be 16 or more separate auto correlators to detect horizontalmovement). Such as system would then be able to detect different levelsor amounts of movement in all of the directions.

Alternate Configurations:

An alternative implementation of this tracking system is possibleutilizing a similar image analysis technique to track a pattern on theceiling to achieve spatial positioning information and simple “tiltsensors” to detect angular orientation of the HMD/Pan-Cam system. Theadvantage of this system is that it is considerably simpler and lessexpensive than the full six axis optical tracker previously described.The fact that the ceiling is at a constant distance and knownorientation from the HMD greatly implifies the optical system, thequality of the required imaging device and the complexity of thesubsequent image analysis. As in the previous six-axis optical trackingsystem, this spatial positioning information is inherently in the formof relative movement only. However, the addition of “absolute referencepoints” allows such a system to re-calibrate its absolute references andthus achieve an overall absolute coordinate system. This absolutereference point calibration can be achieved relatively easily utilizingseveral different techniques. The first, and perhaps simplest techniqueis to use color sensitive retroflective spots as previously described.Alternately, active optical beacons (such as LED beacons) could also beutilized. A further alternative absolute reference calibration systemwhich could be used is based on a bi-directional infrared beacon. Suchas system would communicate a unique code between the HMD and thebeacon, such that calibration would occur only once each time the HMDpassed under any of these “known spatial reference points”. This isrequired to avoid “dead tracking regions” within the vicinity of thecalibration beacons due to multiple origin resets.

Simplifications:

The basic auto correlation technique used to locate movement within theimage can be simplified into reasonably straightforward image processingsteps. Firstly, rotation detection can be simplified into a group oflateral shifts (up, down, left, right) symmetrical around the center ofthe image (optical axis of the camera). Additionally, these “samplepoints” for lateral movement do not necessarily have to be very large.They do however have to contain unique picture information. For examplea blank featureless wall will yield no useful tracking informationHowever an image with high contrast regions such as edges of objects orbright highlight points is relatively easily tracked. Taking thisthinking one step further, it is possible to first reduce the entireimage into highlight points/edges. The image can then be processed as aseries of horizontal and vertical strips such that auto correlationregions are bounded between highlight points/edges. Additionally, smallhighlight regions can very easily be tracked by comparing previous imageframes against current images and determining “closest possible fit”between the images (i.e. minimum movement of highlight points). Suchtechniques are relatively easy and well within the capabilities of mostmoderate speed micro-processors, provided some of-the imagepre-processing overhead is handled by hardware.

1) An interactive image capture and display system comprising a) animage input means including an array of electronic image capture devicesdistributed in a horizontal plane such that their fields of viewpartially overlap and collectively cover a full 360 degrees; and b) animage storage and playback means compatible with existing televisionstandards; c) a signal processing means including 1) a means ofproducing graphical imagery depicting a panoramic image such that saidpanoramic image is composed of a plurality of smaller image sections; 2)a means for cropping, distorting and aligning individual images producedby the said image capture devices to produce an overall 360 degreepanoramic image with negligible distortion and overlap between theindividual image sections and wherein each pixel in the resulting 360degree panoramic image has the same effective width, where each pixelsubtends an equal horizontal angle to the center of said panoramicimage; 3) a means for generating an image representing a subset of thesaid 360 degree panoramic image, whereby the azimuth and elevation ofthe center of said subset is adjustable by user control; 4) a means forselectively combining and geometrically altering real time imagery fromsaid capture devices and prerecorded imagery to create a compositeaugmented reality experience; 5) a means for determining the correctlocation of said image sections within said 360 degree panoramic imageutilizing additional information present in the source media; 6) a meansfor inserting tracking information to describe at least the currentorientation of said array of electronic image capture devices into anoutgoing video stream; 7) a means for encoding multi-track audio suchthat it maintains compatibility with standard video storage, playbackand transmission systems; and 8) a means for producingorientation-sensitive audio in real-time, utilizing multi-track audioinformation and controlled by coordinates of a viewport within saidpanoramic image; d) an image output means capable of outputting an imagein a format compatible with existing television standards; e) an audiooutput means capable of outputting at least 2 channels of audio; f) adisplay means including at least one display device; g) a user controlmeans including an input device allowing the user to control said signalprocessing means; and h) A tracking means capable of measuring at leastazimuth and elevation of said array of electronic image capture devices.2) The system according to claim 1 further comprising signal processingmeans for applying distortion correction to the images, wherein eachpixel in the resulting 360 degree panoramic image has the same effectiveheight, where each pixel subtends an equal vertical angle to the centerof said panoramic image. 3) A system according to claim 1 in which saiddisplay means is a conventional television type display device and theuser input means is an infrared or radio based manually operated remotecontrol device. 4) A system according to claim 1 in which said displaymeans is a helmet mounted display device and the user input means is anautomatic tracking device that calculates at least azimuth and elevationof the user's head. 5) A system according to claim 1 which utilizes amodified television protocol comprising a plurality of video fields orframes such that each field or frame includes at least one of graphicaldata, sound data, and control information, wherein the signal from saidimage playback means is compatible with at least one widely acceptedtelevision standard. 6) A system according to claim 5 wherein saidmodified television protocol further comprises, within one or more scanlines of a standard video image, additional coded data defining controlparameters and image manipulation data for a signal processing means. 7)A system according to claim 5 wherein said graphical data comprisessections of said 360 degree panoramic image. 8) A system according toclaim 5 further comprising, within one or more scan lines of a standardvideo image, additional coded data providing information defining theplacement position of image sections within said 360 degree panoramicimage. 9) A system according to claim 5 further comprising within one ormore scan lines of a standard video image, additional coded dataproviding information for the generation of four or more real-time audiotracks. 10) A system according to 5 further comprising within one ormore scan lines of a standard video image, additional coded dataproviding a) audio information for generation of four or more real-timeaudio tracks; and b) data descriptive of a number of employed audiotracks, an employed audio data format, an employed audio sampling rate,and track synchronization, whereby said signal processing means candecode the audio information into position and orientation sensitivesound. 11) A system according to claim 5 further comprising, within oneor more scan lines of a standard video image, additional coded datawhich provides information as to absolute orientation and X-Y-Z positionof said capture device array. 12) A system according to claim 1 furthercomprising a) means for mathematically combining information aboutazimuth and elevation of a viewer; and b) means for encoding multi-trackaudio for use with standard video storage and transmission systems suchthat the combined information can be subsequently decoded by specifichardware to produce a left and right audio channel with spatiallycorrect three-dimensional audio for the left and right ears of a viewer.13) A system according to claim 1 further comprising means for varyingangular field of view of said viewport within said panoramic imageresponsive to runtime user control. 14) A system according to claim 1further comprising means for varying the position of a viewpoint withina three-dimensional virtual space responsive to runtime user control.15) A system according to claim 1 further comprising: a) a trackingdevice for continuously calculating a viewer's physical position; and b)means for varying the position of a viewpoint within a three-dimensionalvirtual space responsive to said position. 16) A system according toclaim 1 further comprising means for providing orientation-sensitiveaudio in real-time, controlled by the direction of the viewer's head.17) A system according to claim 1 further comprising means for providingorientation-sensitive audio in real-time, controlled by coordinates of aviewport within said panoramic image. 18) A system according to claim 1further comprising means for providing position-sensitive audio inreal-time, controlled by the virtual position of a viewpoint within athree-dimensional virtual space. 19) A system according to claim 1wherein said signal processing means comprises a) one or more videodigitizing modules; b) one or more memory areas selected from the groupconsisting of ARM, VRM, and TM; c) digital processing means for 1)altering address mapping of data held in at least one of ARM and VRM soas to effectively move graphical information from one location toanother therein; and 2) mathematically combining and altering data fromboth a source location and a destination location, thereby achieving thefunctions of compositing and transformation; and d) one or more video,generation modules. 20) A system according to claim 19 wherein said ARMis mapped to occupy a smaller vertical field of view than said VRM andsaid TM, thereby reducing the amount of data required for the generationof a high-quality image. 21) A system according to claim 19 furthercomprising means for mapping ARM, VRM, and TM at different resolutions,whereby pixels in each memory region can represent different degrees ofangular deviation. 22) A system according to claim 1 further comprisinga) means for displaying imagery; b) means for placing said real-timevideo imagery into ARM and source information from said video playbackmeans into VRM; and c) means for combining imagery from ARM and VRMaccording to a pattern of data held in TM into a composite image beforedisplay. 23) A system according to claim 1 further comprising: a) meansfor displaying imagery; b) means for placing source information fromsaid video playback means into ARM and VRM; and c) means for combiningimagery from ARM and VRM according to a translation map included in thesource media. 24) The system according to claim 1 further comprising a)means for displaying imagery; b) means for placing source informationfrom said video playback means into ARM and VRM; and c) means forcombining imagery from ARM and VRM in accordance with a geometricinterpretation of said real-time video imagery. 25) A system accordingto claim 1 further comprising signal processing means for insertingidentification information to describe the location of individual imagesections that comprise said 360 degree panoramic image into saidoutgoing video stream. 26) A system according to claim 1 wherein saidtracking information also describes the current spatial position of saidarray of electronic image capture devices into said outgoing videostream. 27) A system according to claim 1 whereby said signal processingmeans utilizes data received from said array of electronic image capturedevices and, by performing a series of image analysis processes,calculates changes in the orientation of said array of electronic imagecapture devices. 28) A system according to claim 1 whereby said signalprocessing means utilizes data received from said array of electronicimage capture devices and, by performing a series of image analysisprocesses, calculates changes in the position of said array ofelectronic image capture devices. 29) A system according to claim 1wherein said tracking means comprises a) a plurality of reflectivetargets placed at predetermined coordinates; b) a plurality of on-axislight sources strobed in synchronization with the capture rate of saidarray of electronic image capture devices; and c) means for computingabsolute angular and spatial data based on said predeterminedcoordinates and relative angular and spatial data determined by saidarray of electronic image capture devices. 30) A system according toclaim 29 further comprising a plurality of color filters positioned oversaid reflective targets, whereby the ability of said system to correctlyidentify and maintain tracking of said reflective targets is improved.31) A system according to claim 29 wherein said light sources arecolor-controllable, whereby the ability of the system to correctlyidentify and maintain tracking of said reflective targets is improved.32) A system according to claim 1 wherein said tracking meansincorporates active beacons which utilize at least one of pulse timingand color of light to transmit spatial coordinates of each beacon tosaid array of electronic image capture devices, whereby relative angularand spatial data can be determined by said array of electronic imagecapture devices and converted into absolute angular and spatial data.